Pre-trained Text Classifiers

Text classification can be used for auto-tagging customer queries, understanding the sentiment of the audience from social media, or categorizing articles, and blogs into defined topics. Smart Bot provides some of the most common pre-trained text classifiers.

Sentiment SMS Classifier: This classifier is trained on the sentiment 140 dataset. It contains 1,600,000 tweets extracted using the twitter api. The tweets have been annotated (0 = negative, 4 = positive) and they can be used to detect sentiment. This will classify the message (tweet) into positive or negative sentiment. The source of the dataset used to train this classifier can be found at: http://help.sentiment140.com/for-students/
News Classifier: This classifier is trained on source data from public data set on BBC news articles. This classifier classifies the news article into five categories business, political, sport, entertainment and technology. The dataset used to train this classifier can be found at: https://www.kaggle.com/yufengdev/bbc-fulltext-and-category.
Spam SMS Classifier: This classifier is trained on a set of SMS tagged messages that have been collected for SMS Spam research. It contains one set of SMS messages in English with 5,574 messages, tagged as ham (legitimate) or spam. The files contain one message per line. Each line is composed by two columns: first column contains the label (ham or spam) and 2nd column contains the raw text. The dataset used to train this classifier can be found at: http://dcomp.sor.ufscar.br/talmeida/smsspamcollection/.